Poster: Source Code Authorship Attribution

نویسندگان

  • Aylin Caliskan Islam
  • Rachel Greenstadt
چکیده

As information becomes widely available and easily accessible through the Internet and other sources, the trend of plagiarism has been increasing. Plagiarism and copyright infringement are issues that come up in both academic and corporate environments. We need author classification techniques to inhibit such unethical violations. Source code is also intellectual property and reflects individual style. It is important to be able to identify the author of source code. Building a tool to detect the author of a program in an automated way aids in resolving copyleft, copyright and plagiarism issues in the programming fields. Making authorship attribution tools available to the public will also raise consciousness and decrease any possible tendency to plagiarize.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authorship attribution of source code by using back propagation neural network based on particle swarm optimization

Authorship attribution is to identify the most likely author of a given sample among a set of candidate known authors. It can be not only applied to discover the original author of plain text, such as novels, blogs, emails, posts etc., but also used to identify source code programmers. Authorship attribution of source code is required in diverse applications, ranging from malicious code trackin...

متن کامل

Application of Information Retrieval Techniques for Source Code Authorship Attribution

Authorship attribution assigns works of contentious authorship to their rightful owners solving cases of theft, plagiarism and authorship disputes in academia and industry. In this paper we investigate the application of information retrieval techniques to attribution of authorship of C source code. In particular, we explore novel methods for converting C code into documents suitable for retrie...

متن کامل

Source Code Authorship Attribution using n-grams

Plagiarism and copyright infringement are major problems in academic and corporate environments. Existing solutions for detecting infringements in structured text such as source code are restricted to textual similarity comparisons of two pieces of work. In this paper, we examine authorship attribution as a means for tackling plagiarism detection. Given several samples of work from several auth...

متن کامل

Source Code Authorship Attribution Using Long Short-Term Memory Based Networks

Machine learning approaches to source code authorship attribution attempt to find statistical regularities in human-generated source code that can identify the author or authors of that code. This has applications in plagiarism detection, intellectual property infringement, and post-incident forensics in computer security. The introduction of features derived from the Abstract Syntax Tree (AST)...

متن کامل

When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries

The ability to identify authors of computer programs based on their coding style is a direct threat to the privacy and anonymity of programmers. Previous work has examined attribution of authors from both source code and compiled binaries, and found that while source code can be attributed with very high accuracy, the attribution of executable binary appears to be much more difficult. Many pote...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013